Locality in Search Engine Queries and Its Implications for Caching

نویسندگان

  • Yinglian Xie
  • David R. O'Hallaron
چکیده

Caching is a popular technique for reducing both server load and user response time in distributed systems. In this paper, we are interested in the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis results show that: (1) Queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. (2) If caching is to be done at the user side, short-term caching for hours will be enough to cover query temporal locality, while server/proxy caching should be based on longer periods such as days. (3) Most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on user lexicon looks promising. Effort sponsored in part by the Advanced Research Projects Agency and Rome Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-96-1-0287, in part by the National Science Foundation under Grant CMS-9980063, and in part by a grant from the Intel Corporation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Caching Search Engine Results

In this paper we explore the problem of Caching of Search Engine Query Results in order to reduce the computing and I/O requirements needed to support the functionality of a search engine of the worldwide web. Based on traces from search engines we show that there is signiicant locality in the queries asked, that is, 20-30% of the queries have been previously submitted by the same or a diierent...

متن کامل

A Hybrid Strategy for Caching Web Search Engine Results

This work discusses the design and implementation of an efficient caching system aimed to exploit the locality present in the queries submitted to a Web Search Engine (WSE). We enhance previous proposals in several directions. First we propose the adoption of a hybrid strategy for caching, and then we experimentally demonstrate the superiority of our hybrid strategy. Further we show how to take...

متن کامل

On caching search engine query results

In this paper we explore the problem of Caching of Search Engine Query Results in order to reduce the computing and I/O requirements needed to support the functionality of a search engine of the World-Wide Web. We study query traces from the EXCITE search engine and show that they have a significant amount of temporal locality: that is, a significant percentage of the queries have been submitte...

متن کامل

A Highly Scalable Parallel Caching System for Web Search Engine Results

This paper discusses the design and implementation of SDC, a new caching strategy aimed to efficiently exploit the locality present in the stream of queries submitted to a Web Search Engine. SDC stores the results of the most frequently submitted queries in a fixed-size read-only portion of the cache, while the queries that cannot be satisfied by the static portion compete for the remaining ent...

متن کامل

Towards a Distributed Web Search Engine

We present recent and on-going research towards the design of a distributed Web search engine. The main goal is to be able to mimic a centralized search engine with similar quality of results and performance, but using less computational resources. The main problem is the network latency when different servers have to process the queries. Our preliminary findings mix several techniques, such as...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002